Scripts, Notebooks & Version Control
Thursday, September 26
Today we will…
- Answer Clarifying Questions:
- Syllabus?
- Questions from the reading?
- New Material
- Scripts + Notebooks
- Version Control
- Lab 1: Introduction to Quarto
- Challenge 1: Modifying your Quarto Document
Questions about the Syllabus
Questions from the Reading
Groupworthy Data Science
Groupworthy Data Science
The purpose of the study is to understand how power and authority are negotiated by students when participating in pair programming tasks, as well as how an instructor’s pedagogy impacts the structure of these experiences.
Consent to Participate
If you agree to participate…
you will be recorded once a week for 10-weeks while participating in pair programming to complete collaborative tasks.
you will complete a pre- and post-survey about your prior computing experiences and your attitudes toward data science.
. . .
Your participation in this research will not affect your course grade.
. . .
Please complete the consent form (https://forms.gle/oax73hoe7uRSVLYw8) by Monday, 9/30.
My Collaborator
Judith Canner, Professor of Statistics
- Joined faculty at CSUMB in 2010
- Is CSUMB’s Statistics and Data Science Program Coordinator.
- Favorite classes to teach are Data Visualization and Statistical Computing.
- Learned how to program in R in 2005 when my PhD adviser handed me some code and said “modify this.”
- Goal is to see statistics and data science education that is accessible to everyone!
Scripts + Notebooks
Scripts
- Scripts (
File > New File > R Script) are files of code that are meant to be run on their own.
. . .
Scripts can be run in RStudio by clicking the
Runbutton at the top of the editor window when the script is open.You can also run code interactively in a script by:
highlighting lines of code and hitting run.
placing your cursor on a line of code and hitting run.
placing your cursor on a line of code and hitting
ctrl + enterorcommand + enter.
Notebooks
Notebooks are an implementation of literate programming.
They allow you to integrate code, output, text, images, etc. into a single document.
E.g.,
- Quarto notebook
- R Markdown notebook
- Jupyter notebook
We love notebooks because they help us produce a reproducible analysis!
What is Markdown?
Markdown is a markup language.
It uses special symbols and formatting to make pretty documents.
. . .
- *italics* – makes italics
- **bold** – makes bold text
- # – makes headers
- ! – includes images or HTML links
- < > – embeds URLs
. . .
Markdown files have the .md extension.
What is Quarto?
Quarto unifies and extends the R Markdown ecosystem.
Quarto files have the .qmd extension.
Highlights of Quarto
Consistent implementation of attractive and handy features across outputs:
- E.g., tabsets, code-folding, syntax highlighting, etc.
More accessible defaults and better support for accessibility.
Guardrails that are helpful when learning:
- E.g., YAML completion, informative syntax errors, etc.
Support for other languages like Python, Julia, Observable, and more.
Quarto Formats
Quarto makes moving between outputs straightforward.
- All that needs to change between these formats is a few lines in the front matter (YAML)!
Document
title: "Lesson 1"
format: htmlPresentation
title: "Lesson 1"
format: revealjsWebsite
project:
type: website
website:
navbar:
left:
- lesson-1.qmdQuarto Components
. . .
How does Quarto know that a section of text should be interpreted as R code?
R Code Options in Quarto
R code chunk options are included at the top of each code chunk, prefaced with a #| (hashpipe).
- These options control how the following code is run and reported in the final Quarto document.
- Some R code options can also be included in the front matter (YAML) which would be applied globally to the entire document.
R Code Options in Quarto
YAML Completion in Quarto
Rendering your Quarto Document
To take your .qmd file and make it look pretty, you have to render it.
Rendering your Quarto Document
Quarto CLI (command line interface) orchestrates each step of rendering:
- Process the executable code chunks with either
knitrorjupyter. - Convert the resulting Markdown file to the desired output.
Rendering your Quarto Document
When you click Render:
- Your file is saved.
- The R code written in your .qmd file gets run in order.
- It starts from scratch, even if you previously ran some of the code in RStudio.
- A new file is created.
- If your Quarto file is called “Lab1.qmd”, then a file called “Lab1.html” will be created.
- This will be saved in the same folder as “Lab1.qmd”.
Version Control
Version Control
A process of tracking changes to a file or set of files over time so that you can recall specific versions later.
Git vs GitHub
knitr::include_graphics("https://bornsql.ca/wp-content/uploads/2022/03/Git-Logo-2Color.png") - A system for version control that manages a collection of files in a structured way.
- Uses the command line or a GUI.
- Git is local.
Git vs GitHub
- A system for version control that manages a collection of files in a structured way.
- Uses the command line or a GUI.
- Git is local.
- A cloud-based service that lets you use git across many computers.
- Basic services are free, advanced services are paid (like RStudio!).
- GitHub is remote.
Why Learn GitHub?
- GitHub provides a structured way for tracking changes to files over the course of a project.
- Think Google Docs or Dropbox history, but more structured and powerful!
GitHub makes it easy to have multiple people working on the same files at the same time.
You can host a URL of fun things (like the class text, these slides, the course website, etc.) with GitHub pages.
Git Repositories
Git is based on repositories.
- Think of a repository (repo) as a directory (folder) for a single project.
- This directory will likely contain code, documentation, data, to do lists, etc. associated with the project.
- You can link a local repo with a remote copy.
Actions in Git
Cloning a Repo
Create an exact copy of a remote repo on your local machine.
Committing Changes
Tell git you have made changes you want to add to the repo.
- Also provide a commit message – a short label describing what the changes are and why they exist.
The red line is a change we commit (add) to the repo.
. . .
The log of these changes is called your commit history.
- You can always go back to old copies!
Commit Tips
- Use short, but informative commit messages.
- Commit small blocks of changes – commit every time you accomplish a small task (e.g., one problem in the lab).
- You’ll have a set of bite-sized changes (with description) to serve as a record of what you’ve done.
- With frequent commits, its easier to find the issue if / when you mess up!
Pushing Changes
Update the copy of your repo on GitHub so it has the most recent changes you’ve made on your machine.
Pulling Changes
Update the local copy of your repo (the copy on your computer) with the version on GitHub.
Workflow
When you have an existing local repo:
- Pull the repo to make sure you have the most up to date version (especially if you are working on different computers).
- Make some changes locally.
- Commit the changes to git.
- Push your changes to GitHub.
Connect GitHub to RStudio
Previous Steps
You were asked to complete the following steps before coming to class today:
- Create a GitHub account
- Introduce yourself to git (in RStudio)
- Generate a Personal Access Token (PAT)
- Store your PAT in RStudio
Verifying Your Connection
Open RStudio and run the following code in your console (lower left pane):
usethis::git_sitrep(). . .
You should see something like:
── GitHub user
• Default GitHub host: 'https://github.com'
• Personal access token for 'https://github.com': '<discovered>'
• GitHub user: 'atheobold'
• Token scopes: 'admin:org, admin:public_key, delete:packages, delete_repo, gist, notifications, repo, user, workflow, write:packages'
• Email(s): 'atheobol@calpoly.edu (primary)', 'theobold.allison970@gmail.com', '12439090+atheobold@users.noreply.github.com'
ℹ No active usethis project
. . .
If that is not the case, Dr. Theobold will help you troubleshoot in 5-minutes!
Accessing Lab 1
Accessing Lab 1
Here are step by step directions: Copying the Lab Assignment with GitHub Classroom in 11 Steps
Step 1: Open the Lab 1 assignment on GitHub Classroom
Step 2: Open your Lab 1 repository
Step 3: Clone the repository to your computer
Once You’ve Cloned the Repo
Step 4: Open the lab-1.qmd file
Step 5: Change your name
Step 6: Commit your change (with a nice message!)
Step 7: Push your change
Lab 1 Instructions
To do…
- Lab 1: Introduction to Quarto
- Due Sunday (9/29) at 11:59pm
- Challenge 1: Modifying Your Quarto Document
- Due Sunday (9/29) at 11:59pm
- Complete the Week 2 Coursework
- Check-ins 2.1, 2.2, 2.3 due Tuesday (10/1) by the start of class